# Monocular depth estimation

Distill Any Depth Large Hf
MIT
Distill-Any-Depth is a new SOTA monocular depth estimation model trained using knowledge distillation algorithms.
3D Vision Transformers
D
xingyang1
2,322
2
Depthmaster
Apache-2.0
DepthMaster is a refined single-step diffusion model that customizes generative features from diffusion models for discriminative depth estimation tasks.
3D Vision English
D
zysong212
50
9
Coreml DepthPro
DepthPro is a monocular depth estimation model capable of predicting depth from a single image.
3D Vision
C
KeighBee
17
4
Depth Anything V2 Base Hf
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, offering finer details and stronger robustness.
3D Vision Transformers
D
depth-anything
47.73k
1
Depth Anything V2 Large
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on a large amount of synthetic and real images, providing fine depth details and high robustness.
3D Vision English
D
depth-anything
130.54k
94
Depth Anything V2 Small
Apache-2.0
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on large-scale synthetic and real images. Compared to V1, it captures finer details and is more robust.
3D Vision English
D
depth-anything
55.22k
64
Coreml Depth Anything Small
Apache-2.0
Depth Anything is a depth estimation model based on the DPT architecture and DINOv2 backbone network, trained on approximately 62 million images, achieving state-of-the-art results in relative and absolute depth estimation tasks.
3D Vision
C
apple
51
36
Zoedepth Nyu Kitti
MIT
ZoeDepth is a depth estimation model fine-tuned on NYU and KITTI datasets, capable of estimating depth values in actual metric units.
3D Vision Transformers
Z
Intel
20.32k
5
Zoedepth Nyu
MIT
ZoeDepth is a model for monocular depth estimation, specifically fine-tuned on the NYU dataset, capable of zero-shot transfer and metric depth estimation.
3D Vision Transformers
Z
Intel
1,279
1
Depth Anything Base Hf
A depth estimation model based on Transformers.js, adapted for ONNX weights version, used to predict depth information from images.
3D Vision Transformers
D
Xenova
53
0
Depth Anything Small Hf
Small ONNX-format depth estimation model adapted for Transformers.js framework, suitable for web-based depth map prediction
3D Vision Transformers
D
Xenova
4,829
8
Depth Anything Vitl14
Depth Anything is a powerful depth estimation model that unleashes the potential of depth estimation using large-scale unlabeled data.
3D Vision Transformers
D
LiheYoung
16.70k
42
Depth Anything Vits14
Depth Anything is a depth estimation model that leverages large-scale unlabeled data to enhance performance, suitable for monocular depth estimation tasks.
3D Vision Transformers
D
LiheYoung
8,130
6
Sentis MiDaS
MIT
Converted MiDaS model to ONNX format for monocular depth estimation in Unity Sentis
3D Vision
S
julienkay
31
5
Dpt Swinv2 Large 384
MIT
DPT model based on SwinV2 backbone network for monocular depth estimation, trained on 1.4 million images
3D Vision Transformers
D
Intel
84
0
Dpt Swinv2 Tiny 256
MIT
DPT model based on SwinV2 backbone network for monocular depth estimation, trained on 1.4 million images.
3D Vision Transformers
D
Intel
2,285
9
Dpt Swinv2 Base 384
MIT
The DPT (Dense Prediction Transformer) model is trained on 1.4 million images for monocular depth estimation. This model uses Swinv2 as the backbone network and is suitable for high-precision depth prediction tasks.
3D Vision Transformers
D
Intel
182
0
Dpt Beit Large 384
MIT
Monocular depth estimation model based on BEiT backbone network, capable of inferring detailed depth information from a single image
3D Vision Transformers
D
Intel
135
0
Dpt Hybrid Midas
Apache-2.0
A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images
3D Vision Transformers
D
Intel
224.05k
94
Glpn Nyu
Apache-2.0
The GLPN model is trained on the NYUv2 dataset for monocular depth estimation, combining global and local path networks to achieve high-precision depth prediction.
3D Vision Transformers
G
vinvino02
7,699
22
Glpn Kitti
Apache-2.0
GLPN is a model for monocular depth estimation, using SegFormer as the backbone network with a lightweight head added on top for depth prediction.
3D Vision Transformers
G
vinvino02
3,401
7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase